Fix scanning issues related to Unicode escapes in identifiers by graphemecluster · Pull Request #61042 · microsoft/TypeScript

graphemecluster · 2025-01-24T15:43:54Z

This PR fixes several issues related to identifiers scanning:

For regular expression group name

Unicode escapes and extended Unicode escapes are not recognised at the beginning of the identifier because I thought scanIdentifier handles that for me. In this PR, scanIdentifier itself is refactored so this is automatically fixed.
Unlike identifiers anywhere else, surrogate pair escapes should be allowed within a RegExp group name per Normative: Fully specify legal escape sequences in RegExp capture group names tc39/ecma262#1869. I discovered this problem when I ran into [JavaScript]: unicode escape sequence within identifier is always in unicode mode firasdib/Regex101#2372.

For all identifiers

Text between Unicode escapes is ignored. Fixes Text between Unicode escapes within an identifier is skipped #61043.

- Added the `scanIdentifierStart` method - Refactored `scanIdentifierParts` & `scanIdentifier` Implementing the above automatically fixes another issue that Unicode escapes & extended Unicode escapes are not recognised at the beginning of a RegExp group name.

…fierStart`

graphemecluster

You might want to leave out the last commit when reviewing for a better diff.

graphemecluster · 2025-01-24T15:48:13Z

src/compiler/scanner.ts

-        // "-" and ":" are valid in JSX Identifiers
-        (identifierVariant === LanguageVariant.JSX ? (ch === CharacterCodes.minus || ch === CharacterCodes.colon) : false) ||
+        // "-" is valid in JSX Identifiers. ":" is part of JSXNamespacedName but not JSXIdentifier.
+        identifierVariant === LanguageVariant.JSX && ch === CharacterCodes.minus ||


Changing this line doesn't affect everything in the test suite, but I am not sure if it affects code completion.

graphemecluster · 2025-01-24T15:49:38Z

src/compiler/scanner.ts

                ch = peekExtendedUnicodeEscape();
-                if (ch >= 0 && isIdentifierPart(ch, languageVersion)) {
+                if (ch >= 0 && isIdentifierPart(ch, languageVersion, identifierVariant)) {
+                    result += text.substring(start, pos);


This is the missing line that causes #61043.

graphemecluster · 2025-01-24T15:50:46Z

src/compiler/scanner.ts

-        pos += charSize(ch);
-        return token; // Still `SyntaxKind.Unknown`
+        tokenFlags = TokenFlags.None;
+        return scanIdentifier(ScriptTarget.ESNext);


I tested and it does not affect anything in the test suite even if pos is not advanced.

graphemecluster · 2025-01-24T15:59:55Z

src/compiler/scanner.ts

+        return token = SyntaxKind.Identifier;
+    }
+
+    function scanIdentifier(languageVersion: ScriptTarget, identifierVariant?: LanguageVariant | "RegExpGroupName") {


Alternatively, either an internal value can be added into LanguageVariant, or an enum called IdentifierVariant can be created, but for the latter case, I am not sure if isIdentifierPart and isIdentifierText should also be amended since it might be breaky, at least at type level.

graphemecluster · 2025-01-24T16:10:19Z

tests/baselines/reference/regularExpressionGroupNameUnicodeEscapes.errors.txt

+    /(?<\u{D800}\u{DC00}>)\k<\u{D800}\u{DC00}>/;
+
+!!! error TS1514: Expected a capturing group name.
+                ~~~~~~~~
+!!! error TS1538: Unicode escape sequences are only available when the Unicode (u) flag or the Unicode Sets (v) flag is set.


The first escape sequence error is not reported as it starts at the same position as the TS1514 error. Is it really desirable? Personally, I would prefer to have an additional check that the last error is not of length zero, in which case it is not considered to be overlapping.

graphemecluster · 2025-11-04T07:52:53Z

@RyanCavanaugh I understand that your Team is busy working on Corsa, but since you just pinged me regarding another regex issue, would it be possible for you to take some time to have a look at this pull request too?

Edit: Due to Ron’s unfortunate situation, please allow me to bring this to the attention of an extra person. @jakebailey?

jakebailey · 2025-11-17T23:13:53Z

I intend to review this, yes, though I will note that whatever we add here is going to have to get ported to the new Go compiler so hopefully it's not too complicated or tied to JS strings

graphemecluster added 4 commits January 23, 2025 03:19

Parse regular expression group names with surrogate pairs

da5df6e

Add test cases for identifiers with text in-between Unicode escapes

d7e371f

Move scanIdentifierParts & getIdentifierToken next to `scanIdenti…

81002e9

…fierStart`

graphemecluster mentioned this pull request Jan 24, 2025

Text between Unicode escapes within an identifier is skipped #61043

Open

typescript-bot added the For Uncommitted Bug PR for untriaged, rejected, closed or missing bug label Jan 24, 2025

graphemecluster commented Jan 24, 2025

View reviewed changes

RyanCavanaugh requested a review from rbuckton January 24, 2025 17:57

jakebailey requested review from jakebailey and removed request for rbuckton November 17, 2025 23:14

typescript-bot added For Backlog Bug PRs that fix a backlog bug and removed For Uncommitted Bug PR for untriaged, rejected, closed or missing bug labels Nov 17, 2025

jakebailey mentioned this pull request Nov 19, 2025

Implement regexp literal syntax checking microsoft/typescript-go#2026

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix scanning issues related to Unicode escapes in identifiers#61042

Fix scanning issues related to Unicode escapes in identifiers#61042
graphemecluster wants to merge 4 commits intomicrosoft:mainfrom
graphemecluster:scan-identifiers-with-escapes

graphemecluster commented Jan 24, 2025 •

edited

Loading

Uh oh!

graphemecluster left a comment

Uh oh!

graphemecluster Jan 24, 2025

Uh oh!

graphemecluster Jan 24, 2025

Uh oh!

graphemecluster Jan 24, 2025

Uh oh!

graphemecluster Jan 24, 2025

Uh oh!

graphemecluster Jan 24, 2025

Uh oh!

graphemecluster commented Nov 4, 2025 •

edited

Loading

Uh oh!

jakebailey commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

graphemecluster commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

For regular expression group name

For all identifiers

Uh oh!

graphemecluster left a comment

Choose a reason for hiding this comment

Uh oh!

graphemecluster Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

graphemecluster Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

graphemecluster Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

graphemecluster Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

graphemecluster Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

graphemecluster commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakebailey commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

graphemecluster commented Jan 24, 2025 •

edited

Loading

graphemecluster commented Nov 4, 2025 •

edited

Loading